Aims

Here I aim to demonstrate the use of our work to model the arterial input function for PET imaging using count data. Firstly, I will go through how we arrange the data ready for these models. Next, I will demonstrate the application of each model to this data.

Preparation

Libraries

We’ll also load the model functions as well as some helper functions which we’ll use along the way.

source("model_functions.R")
source("helper_functions.R")

Loading data

I have saved the prepared AIF data in the Rawdata folder. This dataset consists of 10 patients.

# load data
filenames = list.files(path = "../Rawdata", pattern="*.csv") ##name of files
data = tibble(
  patient = substr(filenames,1,5), #the first 5 character of filenames
  count = map(filenames,~read.csv(paste0("../Rawdata/", .)) %>% select(-X))
) %>% 
  mutate(count = map(count, ~.x %>% 
                          mutate(Discrete=Method=="Discrete")))

Data

General Approach

Conventionally, modelling of the input function is performed by converting all the count data into radioactivity. We also estimate values of the whole-blood-to-plasma ratio (BPR) using blood radioactivity and plasma radioactivity values, which are either modelled or interpolated from a few data points. We also estimate values of the plasma parent fraction using measured values, which are either interpolated or modelled.

In this approach, we model the input function using the original count data, i.e. before converting the data to radioactivity. However, we still require estimates of the BPR and parent fraction over time. For this reason, the data must firstly be processed in a conventional manner with radioactivity to derive estimates of the BPR and plasma parent fraction, as well as for assessing the effects of dispersion correction.

In the data below, then, there will both be count data, as well as processed data generated using the counts data converted to radioactivity. The strategy of this model is to make use of the counts data to allow a us to model the data with a more correct error distribution, but this requires first processing the data in a more conventional manner, and using the counts once again when applying the model itself.

Example Data

We’ll start out by looking at one of the individuals within the dataset. The data is as follows:

subject_1 = data$count[1]%>% 
  as.data.frame()

datatable(subject_1)
  • Time : time of drawing blood sample (i.e. not of measurement of radioactivity).
  • Method : there were two kinds of methods of drawing blood sample. One is automatically drawn and measured by an autosampler machine, named \(Continuous\); the other is drawn manually, named \(Discrete\).
  • Count : the number of recorded counts from the gamma counter.
  • aif : arterial input function estimated in the conventional manner.
  • backg : backgroud. The measuring instrument needs background correction, and this is the estimated background radioactivity in count.
  • bpr: blood-to-plasma ratio. We were interested in the counts in plasma, however, the machine could only measure counts in blood. By measuring both radioactivity in plasma and blood in manual samples, we model blood-to-plasma ratio along with time. Then the bpr was used to correct samples collected by machine.
  • ParentFraction : The fraction of radioactivity originating from the parent compound, and not from radioactive daughter compounds (i.e. metabolites).
  • dis_frc: used for dispersion correction. Dispersion correction is calculated as a fraction multiplier for each time point using the original and corrected AIF radioactivity values. In this way, we incorporate dispersion correction into the count model.
  • t_G : time of measuring counts in plasma for manual samples/counts in blood for automatic samples (i.e. not of drawing the blood sample).
  • vol : volume of the plasma sample.
  • delta : duration for which the sample is in the gamma tube (i.e. longer duration results in more counts).
  • tmax : the time when AIF reaches the peak.
  • disp_aif : arterial input function after dispersion correction.

Here I will plot the AIF calculated in the conventional manner, after transforming arterial blood measurements to their respective arterial plasma estimates using the blood-to-plasma ratio, and transforming arterial plasma values to their respective metabolite-corrected estimates using the plasma parent fraction.

ggplot(subject_1, aes( x = time, y = disp_aif))+
  geom_point(shape = "diamond", size = 2)+
  labs(x = "Time", y = "AIF",
       title = "AIF over time for patient jdcs1")+
  theme(axis.title.x = element_text(vjust = 0, size = 15),
        axis.title.y = element_text(vjust = 2, size = 15))+
  theme_light()

As we can see, the AIF rises sharply, and then slowly descends. We name the time when AIF reaches the peak as \(tmax\). For AIF before the peak, we used simple linear regression to model it. We were mainly interested in AIF after the peak. We would use parametric and non-parametric ways to model data after \(tmax\). For this reason, we split the curve into ascent (asc) and descent (dsc).

# find tmax and slice the curve
data = data %>% 
  group_by(patient) %>% 
  mutate(count = map(count,~findtmax(.x)), # the tmax is the one with max aif
         count_asc = map(count, ~slice_asc(.x)), # data before tmax
         count_dsc = map(count, ~slice_exp(.x))) # data after tmax and time=time-tmax; t_G=t_G-tmax

Modelling

Offsets

For regression of count data, we use offsets in the model to describe, for instance, if a sample spends longer in the counter, in which case more counts will be recorded.

  1. The way we calculate AIF: \(disp\_aif = \frac{count\times exp^{\frac{log(2)\times time}{20}}\times 0.037\times (0.07756609407608\times exp^{0.080728586340915\times vol})\times parentFraction}{bpr\times disp\_fct\times delta\times vol}\)

  2. The way we set offsets in the code: offset =log(delta)+log(vol)+log(disp_fct)+(-log(2)/20.364*t_G)+(-log(0.003))+(-0.0807*vol)+(-log(parentFraction))+log(bpr)

This is a bit complicated to follow. Let’s break that down:

  • \(delta\) : time in the gamma counter. \(Counts\) were divided by \(delta\) when we calculated \(AIF\).
  • \(vol\) : volume of the plasma sample. \(Counts\) were divided by \(vol\) when we calculated \(AIF\).
  • \(disp\_fct\) : dispersion correction factor
  • \(-log(2)/20.364*t\_G\) : decay correction. The radio tracer in the blood or plasma would keep decaying. We need to correct it back to the time we want. The formula for decay correction : \(A_t = A_0 * e ^{-\lambda t}\), where \(A_0\) is the original activity count at time zero, \(A_t\) is the activity at time \(t\), \(\lambda\) is the decay constant, and \(t\) is the elapsed time.The decay constant is \(\frac{ln(2)}{t_{1/2}}\) where \(t_{1/2}\) is the half-life of the radioactive material of interest.
  • \((-log(0.037*0.0776*\exp{(-0.0807*vol)})\) : calibration of the measured counts from the gamma counter : \(Y = 0.037\times0.07756609407608\times exp^{0.080728586340915\times vol}\). The exponential volume calibration is used only for the manual discrete samples, and not the continuous samples. \(0.037\) is used for calibrating units: 1 \(picocurie\) \(=\) 0.037 \(Bq\).
  • \(parentFraction\) : the estimated metabolite free fraction at each time point.
  • \(bpr\) : the estimated blood-to-plasma ratio at each time point. This is set to 1 for all plasma measurements, and only used for the whole blood measurements which are corrected to their estimated plasma values.

We will add a column for the volumetric calibration which is equal to 1 for the continuous samples.

data <- data %>% 
  mutate(count_dsc = map(count_dsc, ~.x %>% 
                           mutate(calibration = ifelse(Method=="Continuous",
                                                  yes = 0.037 * 0.0776 ,
                                                  no = 0.037 * 0.0776 * exp(0.0807*vol)))))

Then we will also calculate the AIF based on the counts and the offsets in order to be able to visualise the measured values.

create_aif_values <- function(count_dsc) {
  
  suppressWarnings(
    count_dsc <- count_dsc %>% 
      mutate( total_offset = log(delta)+log(vol)+log(disp_fct)+(-log(2)/20.364*t_G)+
                   (-log(calibration)) +
                   (-log(parentFraction))+log(bpr)) %>% 
      mutate(calc_aif = exp(log(count) - total_offset)) %>% 
      mutate(calc_aif = ifelse(is.nan(calc_aif), yes=0, no=calc_aif))
  )
  
}

data <- data %>% 
  mutate(count_dsc = map(count_dsc, create_aif_values))

1. Parametric (tri-exponential) Poisson Regression

pare_data = data %>% 
  group_by(patient) %>% 
  mutate(asc_res = map(count_asc, ~acs_inter(.x)), # detect t0 and interpolate aif between t0 and tmax
         count_asc = map(asc_res, ~.x$data), # add t0 to the data
         asc_mod = map(asc_res, ~.x$segmod), # save model that detecting t0
         
         dsc_mod = map(count_dsc, # fit nonlinear poisson regression for descending part
                       ~kinfitr_gnm(t = .x$time, # time since tmax
                                    t_G = .x$t_G, # time point put in gamma count since injection time
                                    y.sum = .x$count, 
                                    delta = .x$delta, # time in the gamma counter
                                    vol = .x$vol,
                                    calibration = .x$calibration,
                                    pf = .x$parentFraction,
                                    bpr = .x$bpr,
                                    disp = rep(1,nrow(.x))
                                    )
                       ),
         dsc_mod = map(dsc_mod, ~.x$result), # save model fit data after tmax
         asc_pred = map(asc_res, ~.x$pred), # get prediction before tmax
         # get prediction after tmax, contain interpolated aif
         dsc_pred = map2(dsc_mod,count_dsc,~pred_aif(.x,.y))
         
         ) %>% 
  select(-asc_res)
  • Grey dots: the continuous data with blood sample collected by machine
  • Red dots: the discrete data with blood sample collected by experimenters
  • Orange line: the predicted value for parametric regression model
for (i in 1:nrow(pare_data)){
   patient = pare_data$patient[i] 

  data_line = pare_data$count_dsc[i] %>% as.data.frame()
  
  para_data = pare_data$dsc_pred[[i]]$rsd %>% as.data.frame()
  
  # plot of continuous data
  con_data = data_line %>% filter(Method == "Continuous")
  plot(con_data$time, log(con_data$calc_aif), col= "grey",
       xlab = "time(min)",
       ylab = "log(AIF)",
       xlim = c(0,90),
       ylim = c(-2,5),
       main = paste0("Parametric Regression for patient:", patient))
  legend(50,5,c("Continuous data"), text.col = "grey",bty = "n")
  
  # plot of discrete data
  dis_data = data_line %>% filter(Method == "Discrete")
  par(new = TRUE)
  plot(dis_data$time, log(dis_data$calc_aif), col= "red",
       xlab = "time(min)",
       ylab = "log(AIF)",
       xlim = c(0,90),
       ylim = c(-2,5),
       main = "")
  legend(50,4.5,c("Discrete data"), text.col = "red",bty = "n")
 
  # plots of parametric regression
  par(new = TRUE)
  plot(para_data$time_dsc, log(para_data$pred), type = "l", col = "darkorange2", lwd = 2,
       xlab = "time(min)",
       ylab = "log(AIF)",
       xlim = c(0,90),
       ylim = c(-2,5),
       main = ""
       )
  legend(50,4,c("Predicted value"), text.col = "darkorange2",bty = "n")
  
}

2. Non-parametric Poisson Regression

For the non-parametric poisson regression, we used \(SCAM\) function to achieve monotone decreasing smooths. We also added a covariate to account for the discrepancy observed between discrete and continuous data.

Following, we showed fitness for non-parametric poisson regression.

  • Grey dots: the continuous data with blood sample collected by machine
  • Red dots: the discrete data with blood sample collected by experimenters
  • Green line: the predicted value for non-parametric poisson regression model with index variable (Method). The dashed line shows the predicted values for the continuous data, while the solid line shows the predicted values for the discrete data. We consider the discrete data to represent the relevant data-generating process, and the continuous data is less reliable owing to some factor. In this study, it was speculated that the continuous sampler may have been placed too close to participants’ bodies, resulting in extra counts´.

Problem of over-dispersion

Although the predicted value shows poisson regression model fits well, when we plotted the Q-Q plot, it revealed the problem of over-dispersion.

  • Red lines: the reference line

  • Grey bands: the reference bands

3. Non-parametric Model with Negative Binomial distribution

To resolve the problem of over-dispersion, we changed poisson distribution to negative binomial distribution. Because the \(SCAM\) package does not support negative binomial distribution, we also changed the function to \(GAM\) from the mgcv package. However, this means that our non-parametric model is no longer shape-constrained.

  • Grey dots: the continuous data with blood sample collected by machine
  • Red dots: the discrete data with blood sample collected by experimenters
  • Green line: the predicted value for non-parametric negative binomial regression model with index variable (Method)

### Solved problem of over-dispersion

  • Red lines: the reference line

  • Grey bands: the reference bands

Problem of excessive wiggliness

While this model appears to have resolved the issue of the over-dispersion, we were dissatisfied with the fits, as they seem to be overfitting, with upward deviations along what we would expect to a practically monotonic descent.

Solved problem of excessive wiggliness

To this end, we log-transformed time within the model in order to distribute the basis functions more evenly across the curve, with respect to both the number of measurements as well as the expected wiggliness (i.e. second derivative) of the curve. First, we added back tmax for all patients’ data. Then, we log-transformed time value and fit the model. Following shows the regression model, residual plot, and how the model fitted.

  • Grey dots: the continuous data with blood sample collected by machine
  • Red dots: the discrete data with blood sample collected by experimenters
  • Green line: the predicted value for non-parametric negative binomial regression model with index variable (Method)
Tmax for each patient
Patient Tmax Number_of_data
jdcs1 0.666666667 570
jdcs2 0.716666667 567
mhco1 0.95 553
mhco2 1.066666667 546
rwrd1 1.15 540
rwrd2 1.083333333 544
xehk1 1.35 528
xehk2 1 549
ytdh1 0.833333333 560
ytdh2 0.683333333 569

These curves look much more like what we would expect them to look like!

And the QQ plots still appear to look good, shown below.

Figure

4. A model across all patients

Having resolved this issue of wiggliness, we wanted to improve our model further. Especially since our model is no longer shape-constrained, another way to incorporate more conservativism in the model is to model all the individuals at once in a hierarchical model.

Firstly, we found the mean of tmax for all patients and added back tmax for all patients’ data.

GJM: For consistency, it would be nice to see plots like those you show above for the other models too. I think the QQ plot and individual smooths are so simple with one panel for everyone that it’s worth keeping them, but I think it would be nice to show the individual predicted values to get an idea for whether this approach is underfitting the data.

## [[1]]

## 
## [[2]]

## 
## [[3]]

## 
## [[4]]

## 
## [[5]]

## 
## [[6]]

## 
## [[7]]

## 
## [[8]]

## 
## [[9]]

## 
## [[10]]

5. Hierarchical extrapolation

First and last samples

Last samples

First samples

Figure